{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "af78d49e",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/query_engine/recursive_retriever_agents.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43497beb-817d-4366-9156-f4d7f0d44942",
   "metadata": {},
   "source": [
    "# Recursive Retriever + Document Agents\n",
    "\n",
    "This guide shows how to combine recursive retrieval and \"document agents\" for advanced decision making over heterogeneous documents.\n",
    "\n",
    "There are two motivating factors that lead to solutions for better retrieval:\n",
    "- Decoupling retrieval embeddings from chunk-based synthesis. Oftentimes fetching documents by their summaries will return more relevant context to queries rather than raw chunks. This is something that recursive retrieval directly allows.\n",
    "- Within a document, users may need to dynamically perform tasks beyond fact-based question-answering. We introduce the concept of \"document agents\" - agents that have access to both vector search and summary tools for a given document."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9be00aba-b6c5-4940-9825-81c5d2cd2f0b",
   "metadata": {},
   "source": [
    "### Setup and Download Data\n",
    "\n",
    "In this section, we'll define imports and then download Wikipedia articles about different cities. Each article is stored separately."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d744565c",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5543a20d",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e41e9905-77a9-44c5-88ac-c7a4d08a4612",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import (\n",
    "    VectorStoreIndex,\n",
    "    SummaryIndex,\n",
    "    SimpleDirectoryReader,\n",
    "    ServiceContext,\n",
    ")\n",
    "from llama_index.schema import IndexNode\n",
    "from llama_index.tools import QueryEngineTool, ToolMetadata\n",
    "from llama_index.llms import OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e4343cf7-eec9-4a67-b5be-c72dbe3280a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "wiki_titles = [\"Toronto\", \"Seattle\", \"Chicago\", \"Boston\", \"Houston\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6d261882-6793-4eca-ad93-d94d2061e388",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "import requests\n",
    "\n",
    "for title in wiki_titles:\n",
    "    response = requests.get(\n",
    "        \"https://en.wikipedia.org/w/api.php\",\n",
    "        params={\n",
    "            \"action\": \"query\",\n",
    "            \"format\": \"json\",\n",
    "            \"titles\": title,\n",
    "            \"prop\": \"extracts\",\n",
    "            # 'exintro': True,\n",
    "            \"explaintext\": True,\n",
    "        },\n",
    "    ).json()\n",
    "    page = next(iter(response[\"query\"][\"pages\"].values()))\n",
    "    wiki_text = page[\"extract\"]\n",
    "\n",
    "    data_path = Path(\"data\")\n",
    "    if not data_path.exists():\n",
    "        Path.mkdir(data_path)\n",
    "\n",
    "    with open(data_path / f\"{title}.txt\", \"w\") as fp:\n",
    "        fp.write(wiki_text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5bf0c13b-0d77-43e8-8c1c-84258299a494",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load all wiki documents\n",
    "city_docs = {}\n",
    "for wiki_title in wiki_titles:\n",
    "    city_docs[wiki_title] = SimpleDirectoryReader(\n",
    "        input_files=[f\"data/{wiki_title}.txt\"]\n",
    "    ).load_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6189aaf4-2eb7-40bc-9e83-79ce4f221b4b",
   "metadata": {},
   "source": [
    "Define LLM + Service Context + Callback Manager"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "38aed7e1",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"sk-...\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dd6e5e48-91b9-4701-a85d-d98c92323350",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = OpenAI(temperature=0, model=\"gpt-3.5-turbo\")\n",
    "service_context = ServiceContext.from_defaults(llm=llm)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "976cd798-2e8d-474c-922a-51b12c5c6f36",
   "metadata": {},
   "source": [
    "## Build Document Agent for each Document\n",
    "\n",
    "In this section we define \"document agents\" for each document.\n",
    "\n",
    "First we define both a vector index (for semantic search) and summary index (for summarization) for each document. The two query engines are then converted into tools that are passed to an OpenAI function calling agent.\n",
    "\n",
    "This document agent can dynamically choose to perform semantic search or summarization within a given document.\n",
    "\n",
    "We create a separate document agent for each city."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eacdf3a7-cfe3-4c2b-9037-b28a065ed148",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.agent import OpenAIAgent\n",
    "\n",
    "# Build agents dictionary\n",
    "agents = {}\n",
    "\n",
    "for wiki_title in wiki_titles:\n",
    "    # build vector index\n",
    "    vector_index = VectorStoreIndex.from_documents(\n",
    "        city_docs[wiki_title], service_context=service_context\n",
    "    )\n",
    "    # build summary index\n",
    "    summary_index = SummaryIndex.from_documents(\n",
    "        city_docs[wiki_title], service_context=service_context\n",
    "    )\n",
    "    # define query engines\n",
    "    vector_query_engine = vector_index.as_query_engine()\n",
    "    list_query_engine = summary_index.as_query_engine()\n",
    "\n",
    "    # define tools\n",
    "    query_engine_tools = [\n",
    "        QueryEngineTool(\n",
    "            query_engine=vector_query_engine,\n",
    "            metadata=ToolMetadata(\n",
    "                name=\"vector_tool\",\n",
    "                description=(\n",
    "                    f\"Useful for retrieving specific context from {wiki_title}\"\n",
    "                ),\n",
    "            ),\n",
    "        ),\n",
    "        QueryEngineTool(\n",
    "            query_engine=list_query_engine,\n",
    "            metadata=ToolMetadata(\n",
    "                name=\"summary_tool\",\n",
    "                description=(\n",
    "                    \"Useful for summarization questions related to\"\n",
    "                    f\" {wiki_title}\"\n",
    "                ),\n",
    "            ),\n",
    "        ),\n",
    "    ]\n",
    "\n",
    "    # build agent\n",
    "    function_llm = OpenAI(model=\"gpt-3.5-turbo-0613\")\n",
    "    agent = OpenAIAgent.from_tools(\n",
    "        query_engine_tools,\n",
    "        llm=function_llm,\n",
    "        verbose=True,\n",
    "    )\n",
    "\n",
    "    agents[wiki_title] = agent"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "899ca55b-0c02-429b-a765-8e4f806d503f",
   "metadata": {},
   "source": [
    "## Build Composable Retriever over these Agents\n",
    "\n",
    "Now we define a set of summary nodes, where each node links to the corresponding Wikipedia city article. We then define a composable retriever + query engine on top of these Nodes to route queries down to a given node, which will in turn route it to the relevant document agent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6884ff15-bf40-4bdd-a1e3-58cbd056a12a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# define top-level nodes\n",
    "objects = []\n",
    "for wiki_title in wiki_titles:\n",
    "    # define index node that links to these agents\n",
    "    wiki_summary = (\n",
    "        f\"This content contains Wikipedia articles about {wiki_title}. Use\"\n",
    "        \" this index if you need to lookup specific facts about\"\n",
    "        f\" {wiki_title}.\\nDo not use this index if you want to analyze\"\n",
    "        \" multiple cities.\"\n",
    "    )\n",
    "    node = IndexNode(\n",
    "        text=wiki_summary, index_id=wiki_title, obj=agents[wiki_title]\n",
    "    )\n",
    "    objects.append(node)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3eabd221-3c24-468b-9dbe-19689faf57fc",
   "metadata": {},
   "outputs": [],
   "source": [
    "# define top-level retriever\n",
    "vector_index = VectorStoreIndex(\n",
    "    objects=objects, service_context=service_context\n",
    ")\n",
    "query_engine = vector_index.as_query_engine(similarity_top_k=1, verbose=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8dedb927-a992-4f21-a0fb-4ce4361adcb3",
   "metadata": {},
   "source": [
    "## Running Example Queries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8e743c62-7dd8-4ac9-85a5-f1cbc112a79c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1;3;38;2;11;159;203mRetrieval entering Boston: OpenAIAgent\n",
      "\u001b[0m\u001b[1;3;38;2;237;90;200mRetrieving from object OpenAIAgent with query Tell me about the sports teams in Boston\n",
      "\u001b[0mAdded user message to memory: Tell me about the sports teams in Boston\n"
     ]
    }
   ],
   "source": [
    "# should use Boston agent -> vector tool\n",
    "response = query_engine.query(\"Tell me about the sports teams in Boston\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a4ce2a76-5779-4acf-9337-69109dae7fd6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Boston is home to several professional sports teams across different leagues. These teams include the Boston Red Sox in Major League Baseball, the New England Patriots in the National Football League, the Boston Celtics in the NBA, the Boston Bruins in the NHL, and the New England Revolution in Major League Soccer. These teams have a rich history and are widely supported by fans in Boston and across the country.\n"
     ]
    }
   ],
   "source": [
    "print(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aa3d98ab-cb82-4473-ab2b-bc8a17e1b86a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1;3;38;2;11;159;203mRetrieval entering Houston: OpenAIAgent\n",
      "\u001b[0m\u001b[1;3;38;2;237;90;200mRetrieving from object OpenAIAgent with query Tell me about the sports teams in Houston\n",
      "\u001b[0mAdded user message to memory: Tell me about the sports teams in Houston\n"
     ]
    }
   ],
   "source": [
    "# should use Houston agent -> vector tool\n",
    "response = query_engine.query(\"Tell me about the sports teams in Houston\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d476c54b-98af-4d2a-8f17-4baa37d0d360",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Houston is home to several professional sports teams across different leagues. The city has a professional football team called the Houston Texans, a professional basketball team called the Houston Rockets, a professional baseball team called the Houston Astros, a professional soccer team called the Houston Dynamo, and a professional women's soccer team called the Houston Dash. These teams compete in the National Football League (NFL), National Basketball Association (NBA), Major League Baseball (MLB), Major League Soccer (MLS), and National Women's Soccer League (NWSL) respectively. Houston also has minor league baseball, hockey, and other sports teams, making it a city with a rich sports culture.\n"
     ]
    }
   ],
   "source": [
    "print(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ee6ef20c-3ccc-46c3-ad87-667138d78d5d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1;3;38;2;11;159;203mRetrieval entering Chicago: OpenAIAgent\n",
      "\u001b[0m\u001b[1;3;38;2;237;90;200mRetrieving from object OpenAIAgent with query Give me a summary on all the positive aspects of Chicago\n",
      "\u001b[0mAdded user message to memory: Give me a summary on all the positive aspects of Chicago\n",
      "=== Calling Function ===\n",
      "Calling function: summary_tool with args: {\n",
      "  \"input\": \"positive aspects of Chicago\"\n",
      "}\n",
      "Got output: Chicago is a vibrant city with a diverse economy and a wide range of industries. It serves as a major hub for finance, culture, commerce, industry, education, technology, telecommunications, and transportation. The city has a thriving arts and music scene, making significant contributions to visual arts, literature, film, theater, comedy, food, dance, and various music genres. Chicago is also known for its prestigious universities, including the University of Chicago, Northwestern University, and the University of Illinois Chicago. Furthermore, it is home to professional sports teams in all major leagues.\n",
      "========================\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# should use Seattle agent -> summary tool\n",
    "response = query_engine.query(\n",
    "    \"Give me a summary on all the positive aspects of Chicago\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cfe1dd4c-8bfd-43d0-99bc-ca60861dc418",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Chicago is a vibrant city with a diverse economy and a wide range of industries. It serves as a major hub for finance, culture, commerce, industry, education, technology, telecommunications, and transportation. The city has a thriving arts and music scene, making significant contributions to visual arts, literature, film, theater, comedy, food, dance, and various music genres. Chicago is also known for its prestigious universities, including the University of Chicago, Northwestern University, and the University of Illinois Chicago. Furthermore, it is home to professional sports teams in all major leagues.\n"
     ]
    }
   ],
   "source": [
    "print(response)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
